What we’ll do today

So by now you’ve come a long way in terms of taking your spatial data, and visualising it using maps, being able to present the values of a variable using thematic maps. You’ve had some practice in taking data which has a spatial component, and joining it to a shapefile, using the common column, in order to be able to visually demonstrate variation on something, in this case crime rate, across space.

I hope that you are finding this to be really exciting stuff, and an opportunity to get yourselves accustomed to spatial data. But today we will go a step further, and get your hands dirty with spatial manipulation of your data.

Thus far, our data manipulation exercises were such that you might be familiar with, from your earlier exposures to data analysis. Linking datasets using a common column, calculating a new variable (new column) from values of existing variables, these are all tasks which you can perform on spatial or non-spatial data. However today we will explore some exercises in data manipulation which are specific to spatial data analysis. After this session you can truly say you are masters of spatial data manipulation. So let’s get started with that!

The main objectives for this session are that by the end you will have:

These are all very useful tools for the spatial crime analyst, and we will hope to demonstrate this by working through an example project, where you would make use of all of these tools.

Let’s consider the assumption that licenced premises which serve alcohol are associated with increased crimes.

[something about crime attractors/ generators/ etc here]

So let’s say we want to examine whether certain outlets have more crimes near them than others. We can do this using open data, some R code, and the spatial operations discussed above. So let’s get to it!

Getting some (more) data

Manchester City Council have an Open Data Catalogue on their website, which you can use to browse through what sorts of data they release to the public. There are a some more and some less interesting data sets made available here. It’s not quite as impressive as the open data from some of the cities in the US such as New York or Dallas but we’ll take it.

One interesting data set, especially for our questions about the different alcohol outlets is the Licensed Premises data set. This details all the currently active licenced premises in Manchester. You can see there is a link to download now.

As always, there are a few ways you can download this data set. On the manual side of things, you can simply right click on the download link from the website, save it to your computer, and read it in from there, by specifying the file path. Remember, if you save it in your working directory, then you just need to specify the file name, as the working directory folder is where R will first look for this file. If however you’ve saved this elsewhere, you will need to work out the file path.

Note my favourite shortcut to finding the file path is to simply run the file.choose() function, and use the popup window to navigate to the file. When you open this file through the popup window, if you’re not assigning this to an object, it will simply print out the filepath to your console window. Like so:

You can then copy and paste this path to whatever fuction you are assigning it to, to read in your data.

Reading data in from the web

But, programmers are lazy, and the whole point of using code-based interfaces is that we get to avoid doing unneccessary work, like point-and-click downloading of files. And when data exists online in a suitable format, we can tell R to read the data in from the web directly, and cut out the middle man (that being ourseves in our pointing-and-clicking activity).

How can we do this? Well think about what we do when we read in a file. We say, hello R, i would like to create a new object please and I will call this new object my_data. We do this by typing the name we are giving the object and the assignment function <-. Right? Then on the right hand side of the assignment function, there is the value that we are assigning the variable. So it could be a bit of text (such as when you’re creating a shp_name object and you pass it the string "path to my file"), or it could be some function, for example when you read a csv file with the read.csv() function.

So if we’re reading a csv, we also need to specity where to read the csv from. Where should R look to find this data? This is where normally you are putting in the path to your file, right? Something like:

my_data <- read.csv("path to my file here")

Well what if your data does not live on your laptop or PC? Well, if there is a way that R can still access this data just by following a path, then this approach will still work! So how can we apply this to getting the Licensed Premises data from the web?

You know when you right click on the link, and select “Save As…” or whatever you click on to save? You could, also select “Copy Link Address”. This just copies the webpage where this data is stored. Give it a go! Copy the address, and then paste it into your browser. It will take you to a blank page where a forced download of the data will begin. So what if you pasted this into the read.csv() function?

my_data <- read.csv("www.data.com/the_data_i_want")

Well in this case, the my_data object would be assigned the value returned from the read.csv() function reading in the file from the url you provided. File path is no mysterious thing, file path is simply the path to the file you want to read. If this is a website, then so be it.

So without dragging this on any further, let’s read in the licensed premises data directly from the web:

lic_prem <- read.csv("http://www.manchester.gov.uk/open/download/downloads/id/169/licensed_premises.csv")

You can always check if this worked by looking to your global environment on the righ hand side and seeing if this ‘lic_prem’ object has appeared. If it has, you should see it has 65535 observations (rows), and 36 variables (columns).

Let’s have a look at what this data set looks like. You can use the View() function for this:

View(lic_prem)

We can see there are some interesting and perhaps less interesting columns in there. There are quite a lot of venues in this list as well. Let’s think about subsetting them. Let’s say we’re interested in city centre manchester. We can see that there is a column for postcodes. We know (from our local domain knowledge) That city centre postcodes are M1-M4. So let’s start by subsetting the data to include these.

Subsetting using pattern matching

We could use spatial operations here, and geocode all the postcodes at this point, then use a spatial file of city centre to select only the points contained in this area. The only reason we’re not doing this is because the geocode function takes a bit of time to geocode each address. It would only be about 10 - 15 minutes, but we don’t want to leave you sitting around in the lab for this long, so instead we will try to subset the data using pattern matching in text. In particular we will be using the grepl() function. This function takes a pattern and looks for it in some text. If it finds the pattern, it returns TRUE, and if it does not, it returns FALSE. So you have to pass two parameters to the grepl() function, one of them being the pattern that you want it to look for, and the other being the object in which to search.

So for example, if we have an object that is some text, and we want to find if it contains the letter “a”, we would pass those inside the grepl() function, which would tell us TRUE (yes it’s there) or FALSE (no it’s not there):

some_text <- "this is some text that has some letter 'a's"

grepl("a", some_text)
## [1] TRUE

You can see this returns TRUE, because there is at least one occurrence of the letter a. If there wasn’t, we’d get FALSE:

some_text <- "this is some text tht hs some letters"

grepl("a", some_text)
## [1] FALSE

So we can use this, to select all the cases where we find the pattern “M1” in the postcode. NOTICE the space in our search pattern. It’s not “M1” it’s “M1”. Can you guess why?

Well, M1 will be found in M1 but also in M13, which is the University of Manchester’s postcode, and not the part of city centre in which we are interested.

So let’s subset our data by creating a new object city_centre_prems, and using the piping (%>%) and filter() functions from the dplyr package:

#remember to load dplyr package if you haven't already: 
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#then create the city_centre_prems object:
city_centre_prems <- lic_prem %>%
  filter(grepl("M1 ", POSTCODE) )

Now we only have 284 observations (see your global environment), which is a much more manageable number.

Geocoding from an address

Great OK so we have this list of licensed premises, and we have their address, which is clearly some sort of spatial information, but how would you put this on a map?

Any ideas?

We can use the geocode() function from the ggmap package to turn our addresses into mappable coordinates. geocode() geocodes a location (find latitude and longitude) using either (1) the Data Science Toolkit or (2) Google Maps. Note that when using Google you are agreeing to the Google Maps API Terms of Service at https://developers.google.com/maps/terms. One of the conditions of using Google API to geocode your points is that you have to display them using a Google basemap for example!

We can, at the most basic, geocode the postcode. This will put all the establisments to the centroid of the postocde. Postcodes are used in the United Kingdom as alphanumeric codes, that were devised by Royal Mail. A full postcode is known as a “postcode unit” and designates an area with a number of addresses or a single major delivery point. You can search the Royal Mail for infor on post codes here..

Here is a map of the postcode areas in Greater Manchester:

Now the centroid of the post code area represents the central point of the shapefile. For example, here you can see some polygons with their centroids illustrated by points:

This is not quite as precise as geocoding the actual address, and we will return to this in the homework, but let’s just stick with this approach for now.

So geocode() will help us get the coordinates for the relevant post code centroid. First though, we have to specify in the address where our postcode is. Just like when you mail a postcard (people still do this, right?), you have to specify what country you want it to go to first, and then specify the postcode and address. So we will create a new variable (column) in our dataframe that pastes together the postcode with a suffix to specify our country, in this case “, UK”. To do this, we use the paste() function. paste() just pastes together two or more text values, separating them by whatever separator you want. For example, if I wanted to have people’s name displayed in different names I could use paste in this way:

firstname <- "Kurt"
lastname <- "Cobain"

#this way will paste lastname and firstname, and separate them by a comma
paste(lastname, firstname, sep = ",")
## [1] "Cobain,Kurt"
#this way will paste firstname then lastname, and separate them by a space
paste(firstname, lastname, sep = " ")
## [1] "Kurt Cobain"

So in the same vein, we will now create a new column, call it postcodes2 and use the paste function to put together the postcode with the “, UK” suffix. We want them to be separated by nothing so we use sep = "". (note, if you are separating by nothing, you could use the paste0() function, you can read about this uising the help function if you want)

city_centre_prems$postcodes2 <- paste(city_centre_prems$POSTCODE, ", UK", sep="")

Now we can use this new column to geocode our addresses. I mentioned that the geocode() function is part of the ggmap package. This means we have to load up this package to use this function:

library(ggmap)
## Loading required package: ggplot2

Now we can use geocode(). To use this, you have to specify what you want to geocode (in this case our newly created column of postcode2) and also the method. I mentioned above you can use Google or the Data Science Toolkit. Google puts a restriction on the number of geocodes you can perform in a day, so I normally use dsk, but both have advantages and limitations, so read up on this if you’re interested. But for now, I’m sticking to dsk.

So let’s create a new column, call it postcode_coords, use the geocode function to populate it with values:

city_centre_prems$postcode_coords <- geocode(city_centre_prems$postcodes2, source = 'dsk')
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205JG,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205QA,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203WD,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%206DE,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%207AG,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202EQ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203AQ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203EF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203HN,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%206DD,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201HP,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%207DE,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204EE,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204AH,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204HL,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205NJ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204HE,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202DA,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202AP,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204RL,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203JE,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205JQ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203BH,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%206NF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%207DZ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201PE,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204HF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201NA,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205LE,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203LY,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203HW,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201LY,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204BH,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205LH,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203WB,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203NR,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204QU,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203HB,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204PY,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%207DY,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204GX,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201WT,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203LZ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203GF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205NZ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203LQ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203HU,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%207DU,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202JW,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201RG,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204QX,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203LA,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202BS,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205SH,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204LY,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204GS,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203NJ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%206NG,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203HE,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204FD,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%207EN,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205WW,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203WF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%206FT,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%207HL,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203NB,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201JN,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201PW,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204HQ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204FH,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202AN,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%207DL,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203EZ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%207EL,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205NH,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%206JB,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202EE,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203PJ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%206FU,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202DB,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%206HY,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205JW,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204LX,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%207DB,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202PB,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203BB,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%206HA,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203NY,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205NQ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204JY,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%20%20%203LY,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205WN,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202FJ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205EJ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202HF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202PN,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%20%20%205WW,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201FJ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205NP,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%207HE,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%20%20%204GU,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204EF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%20%20%204AG,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204WB,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201JQ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201WN,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%206DN,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205EA,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%20%204HF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204EJ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204LB,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205FW,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%206DP,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205NE,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201AD,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%20%202PA,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205EE,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201DZ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205WQ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202EJ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202AG,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203WA,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202ND,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205QS,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202GH,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%206DF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204FF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205WH,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203DG,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201LS,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%20999,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204GU,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205WZ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205AN,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204PH,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202JE,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202AQ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201JF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201JG,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205GE,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205BW,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201AF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204BT,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202WA,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202DQ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%20%20%204AJ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%207XN,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201JA,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205QF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204BD,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202HG,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201DB,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202PA,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203HZ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204ZZ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205ND,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%206HR,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202GX,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%206FQ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202AF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205GL,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202HR,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%206EQ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202WN,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201ES,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205NG,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202EH,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202QF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201PX,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204NB,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201FN,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205LD,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202FF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204HD,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201EZ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202FN,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202EA,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204RJ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%207ED,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205LN,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201DN,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202PF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201LU,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205AZ,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%206ET,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204QA,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202AE,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202HN,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202EG,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201DW,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201FY,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%204DY,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203JF,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%206EY,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201LE,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%203HP,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%205BY,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%201FL,%20UK&sensor=false
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=%20M1%202NP,%20UK&sensor=false

Be patient, this will take a while, each postcode has to be referenced against their database and the relevant coordinates extracted. For each point you will see a note appear in red, and while R is working you will see the red stop sign on the top right corner of the Console window:

Also think about how incredibly fast and easy this actually is, if you consider a potential alternative where you have to manualy find some coordinates for each address. That sounds pretty awful, doesn’t it? Compared to that, setting the geocode() function running, and stepping away to make a cup of tea is really a pretty excellend alternative, no?

Note there might be a chance that you’re getting an error telling you that you are over the geocode limit allowed by Google. This appears to be a bug that has been flagged. There is a workaround, to pass some phoney credentials as part of the geocode() function. This stackoverflow discussion is where I found this. So if you’re experiencing this issue, try running the below:

city_centre_prems$postcode_coords <- geocode(city_centre_prems$postcodes2, client = "123", signature = "123", output = 'latlon', source = 'dsk')

Hopefully now you should be getting your points individually gecoded.

Right so hopefully that is done now, and you can have a look at your data again to see what this new column looks like. Remember you can use the View() function to make your data appear in this screen.

View(city_centre_prems)

You will see that we have some coordinates. Woohoo! Let’s “flatten” them out, so we can use them to map in a moment.

city_centre_prems$longitude <- city_centre_prems$postcode_coords$lon
city_centre_prems$latitude <- city_centre_prems$postcode_coords$lat

And now we have a column called longitude for longitude and a column called latitude for latitude. Neat!

Making interactive maps with leaflet

Thus far we have explored a few approaches to making maps. We made great use of the tmaps package for example in the past few weeks. Now we will introduce another approach to making maps, so this is a super brief intro into some of the cool things you can do with the leaflet package. There are comprehensive tutorials available online, for example here.

Leaflet is the leading open-source JavaScript library for mobile-friendly interactive maps. It is very most popular, used by websites ranging from The New York Times and The Washington Post to GitHub and Flickr, as well as GIS specialists like OpenStreetMap, Mapbox, and CartoDB.

In this section of the lab we will learn how to make really flashy looking maps using leaflet.

You will need to have installed the following packages to follow along:

install.packages("leaflet") #for mapping
install.packages("RColorBrewer") #for getting nice colours for your maps

Once you have them installed, load them up with the library() function:

Making a map

To make a map, just load the leaflet library:

library(leaflet)

You then create a map with this simple bit of code:

m <- leaflet() %>%
  addTiles()  

And just print it:

m  

Adding some content:

You might of course want to add some content to your map.

Adding points manually:

You can add a point manually:

m <- leaflet() %>%
  addTiles()  %>% 
  addMarkers(lng=-2.230899, lat=53.464987, popup="You are here")
m  

Or many points manually:

latitudes = c(53.464987, 53.472726, 53.466649) 
longitudes = c(-2.230899, -2.245481, -2.243421) 
popups = c("You are here", "Here is another point", "Here is another point") 
df = data.frame(latitudes, longitudes, popups)      

m <- leaflet(data = df) %>%
  addTiles()  %>%  
  addMarkers(lng=~longitudes, lat=~latitudes, popup=~popups)
m  

Change the basemap

You can change the background as well. You can find a list of different basemaps here.

m <- leaflet(data = df) %>%
  addProviderTiles("Stamen.Toner") %>% 
  addMarkers(lng=~longitudes, lat=~latitudes, popup=~popups)
m  

Adding data from elsewhere

You will most likely want to add data to your map form external sources, rather than manually creating points. In this case, we want to be mapping our licensed premises in the city centre, right? So let’s do this:

m <- leaflet(data = city_centre_prems) %>%
  addProviderTiles("Stamen.Toner") %>% 
  addMarkers(lng=~longitude, lat=~latitude, popup=~as.character(PREMISESNAME), label = ~as.character(PREMISESNAME))
m  

Cool no? Now let’s say you wanted to save this map. You can do this by clicking on the export button at the top of the plot viewer, and choose the Save as Webpage option saving this as a .html file:

Then you can open this file with any type of web browser (safari, firefox, chrome) and share your map that way. You can send this to your friends not on this course, and make them jealous of your fancy map making skills.

One thing you might have noticed is that we still have some points that are not in Manchester. This should illustrate that the pattern matching approach is really just a work-around. Instead, what we really should be doing to subset our data spatially is to use spatial operations. So now we’ll learn how to do some of these in the next section.

Spatial operations

Spatial operations are a vital part of geocomputation. Spatial objects can be modified in a multitude of ways based on their location and shape. For a comprehensive overview of spatial operations in R I would recommend the relevant chatper Chapter 4: Spatial Operations from the project of Robin Lovelace and Jakub Nowosad, Geocomputation with R.

Spatial operations differ from non-spatial operations in some ways. To illustrate the point, imagine you are researching road safety. Spatial joins can be used to find road speed limits related with administrative zones, even when no zone ID is provided. But this raises the question: should the road completely fall inside a zone for its values to be joined? Or is simply crossing or being within a certain distance sufficent? When posing such questions it becomes apparent that spatial operations differ substantially from attribute operations on data frames: the type of spatial relationship between objects must be considered.

So you can see we can do exciting spatial operations with our spatial data, which we cannot with the non-spatial stuff.

For our spatial operations we will be using functions that belong to the sf package. So make sure you have this loaded up:

library(sf)
## Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3

Projections revisited

One important note before we begin to do this brings us back to some of the learning from the second session on map projections. Remember talking about this? We spoke about all the ways of flattening out the earth, and what that means for the maps, and also how the most common ones we will use are WGS 84 and British National Grid? If this is bringing back no memories at all, please go back to your lab notes from the second session to review.

Now that you’ve reviewed (or knew this all along) we can come back to our important note.

It is important to note that spatial operations that use two spatial objects rely on both objects having the same coordinate reference system

If we are looking to carry out operations that involve two different spatial object, they need to have the same projection!!! Funky weird things happen when this condition is not met, so beware!

So how do we know what projection our spatial objects are? Well the sf package contains a handy function called st_crs() which let’s us check. All you need to pass into the brackets of this function is the name of the object you want to know the projection of.

So let’s check what is the CRS of our licenced premises:

st_crs(city_centre_prems)
## $epsg
## [1] NA
## 
## $proj4string
## [1] NA
## 
## attr(,"class")
## [1] "crs"

You can see that we get the projection returned as NA. Can you think of why? Have we made this into a spatial object? Or is this merely a dataframe with a latitude and longitude column? The answer is really in the question here.

So we need to convert this to a sf object, or a spatial object, and make sure that R knows that the latitude and the longitude columns are, in fact, coordinates.

In the st_as_sf() function we specify what we are transforming (the name of our dataframe), the column names that have the coordinates in them (longitude and latitude), the CRS we are using (4326 is the code for WGS 84, which is the projection that uses latitude and longitude coordinates (remember BNG uses Easting and Northing)), and finally agr, the attribute-geometry-relationship, specifies for each non-geometry attribute column how it relates to the geometry, and can have one of following values: “constant”, “aggregate”, “identity”. “constant” is used for attributes that are constant throughout the geometry (e.g. land use), “aggregate” where the attribute is an aggregate value over the geometry (e.g. population density or population count), “identity” when the attributes uniquely identifies the geometry of particular “thing”, such as a building ID or a city name. The default value, NA_agr_, implies we don’t know.

cc_spatial = st_as_sf(city_centre_prems, coords = c("longitude", "latitude"), 
                 crs = 4326, agr = "constant")

Now let’s check the projection of this spatial version of our licensed premises:

st_crs(cc_spatial)
## $epsg
## [1] 4326
## 
## $proj4string
## [1] "+proj=longlat +datum=WGS84 +no_defs"
## 
## attr(,"class")
## [1] "crs"

We can now see that we have this coordinate system as WGS 84. We need to then make sure that any other spatial object with which we want to perform spatial operations is also in the same projection.

Meet a new format of shapefile: geojson

GeoJSON is an open standard format designed for representing simple geographical features, along with their non-spatial attributes. It is based on JSON, the JavaScript Object Notation. It is a format for encoding a variety of geographic data structures.

Geometries are shapes. All simple geometries in GeoJSON consist of a type and a collection of coordinates. The features include points (therefore addresses and locations), line strings (therefore streets, highways and boundaries), polygons (countries, provinces, tracts of land), and multi-part collections of these types. GeoJSON features need not represent entities of the physical world only; mobile routing and navigation apps, for example, might describe their service coverage using GeoJSON.

To tinker with GeoJSON and see how it relates to geographical features, try geojson.io, a tool that shows code and visual representation in two panes.

Let’s read in a geoJSON spatial file, again from the web. This particular geojson represents the wards of Greater Manchester.

manchester_ward <- st_read("https://raw.githubusercontent.com/RUMgroup/Spatial-data-in-R/master/rumgroup/data/wards.geojson")
## Reading layer `OGRGeoJSON' from data source `https://raw.githubusercontent.com/RUMgroup/Spatial-data-in-R/master/rumgroup/data/wards.geojson' using driver `GeoJSON'
## Simple feature collection with 215 features and 12 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 351664 ymin: 381168.6 xmax: 406087.5 ymax: 421039.8
## epsg (SRID):    27700
## proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs

Let’s select only the city centre ward, using the filter() function from dplyr

city_centre <- manchester_ward %>%
  filter(wd16nm == "City Centre")

Let’s see how this looks, using the plot() function:

plot(st_geometry(city_centre))

Now we could use this to make sure that our points included in cc_spatial are in fact only licensed premises in the city centre. This will be your first spatial operation. Excited? Let’s do this!

Subset points to those within a polygon

So we have our polygon, our spatial file of the city centre ward. We now want to subset our point data, the cc_spatial data, which has points representing licensed premises.

First things first, we check whether they have the same crs.

st_crs(city_centre) == st_crs(cc_spatial)
## [1] FALSE

Uh oh! They do not! So what can we do? Well we already know that cc_spatial is in WGS 84, because we made it so a little bit earlier. What about this new city_centre polygon?

st_crs(city_centre) 
## $epsg
## [1] 27700
## 
## $proj4string
## [1] "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs"
## 
## attr(,"class")
## [1] "crs"

Aha, the key is in the 27700. This code in fact stands for…. British National Grid…!

So what can we do? We can reproject our spatial object. Yepp, we can convert between projections.

So let’s do this now. To do this, we can use the st_transform() function.

cc_WGS84 <- st_transform(city_centre, 4326)

Let’s check that it worked:

st_crs(cc_WGS84) 
## $epsg
## [1] 4326
## 
## $proj4string
## [1] "+proj=longlat +datum=WGS84 +no_defs"
## 
## attr(,"class")
## [1] "crs"

Looking good. Triple double check:

st_crs(cc_WGS84) == st_crs(cc_spatial)
## [1] TRUE

YAY!

Now we can move on to our spatial operation, where we select only those points within the city centre polygon. To do this, we can use the st_intersects() function:

# intersection
cc_intersects <- st_intersects(cc_WGS84, cc_spatial)
## although coordinates are longitude/latitude, it is assumed that they are planar
# subsetting
cc_intersects <- cc_spatial[unlist(cc_intersects),]

have a look at this new cc_intersects object in your environment. How many observations does it have? Is this now fewer than the previous cc_spatial object? Why do you think this is?

(hint: you’re removing everything that is outside the city centre polygon)

We can plot this too to have a look:

# plot
plot(st_geometry(cc_WGS84), border="#aaaaaa")
plot(st_geometry(cc_intersects), col = "red", add=T)

COOL, we have successfully performed our first spatial operation, we managed to subset our points data set to include only those points which are inside the polgon for city centre. See how this was much easier, and more reliable than the hacky workaround using pattern matching? Yay!

Spatial operations two: building buffers

Right, but what we want to do really to go back to our original question. We want to know about crime in and around out areas of interest, in this case our licensed premises. But how can we count this?

Well first we will need crime data. Let’s use the same data set from last week. I’m not going over the detail of how to read this in, if you forgot, go back to the notes from last week.

crimes <- read.csv("https://raw.githubusercontent.com/maczokni/2018_labs/master/data/2017-11-greater-manchester-street.csv")

Now let’s make sure again that R is aware that this is a spatial set of points, and that the columns

crimes_spatial = st_as_sf(crimes, coords = c("Longitude", "Latitude"), 
                 crs = 4326, agr = "constant")

Notice that in this case the columns are spelled with upper case “L”. You should always familiarise yourself with your data set to make sure you are using the relevant column names. You can see just the column names using the names() function like so :

names(crimes)
##  [1] "Crime.ID"              "Month"                
##  [3] "Reported.by"           "Falls.within"         
##  [5] "Longitude"             "Latitude"             
##  [7] "Location"              "LSOA.code"            
##  [9] "LSOA.name"             "Crime.type"           
## [11] "Last.outcome.category" "Context"

Or you can have a look at the first 6 lines of your dataframe with the head() function:

head(crimes)
##                                                           Crime.ID   Month
## 1                                                                  2017-11
## 2 f892dce3e7a4c45fe4f8f09f24d6a494f2b49783a976afba3d6f51e01f72521f 2017-11
## 3 f48d18a3e88eaeba043807cd95642dccb2a2e007021de06e55956c3ee7bad0a4 2017-11
## 4                                                                  2017-11
## 5 bee6bb133a8ae0b35cf19d2d994ad2bdc3852d1d0d560ce8510359dd02d2e503 2017-11
## 6 7f5838988383f952ec1445fb266cd731d5fa32831c3cc27c234f2a3696dfc2d7 2017-11
##                 Reported.by              Falls.within Longitude Latitude
## 1 Greater Manchester Police Greater Manchester Police -2.462774 53.62210
## 2 Greater Manchester Police Greater Manchester Police -2.462774 53.62210
## 3 Greater Manchester Police Greater Manchester Police -2.462774 53.62210
## 4 Greater Manchester Police Greater Manchester Police -2.464422 53.61250
## 5 Greater Manchester Police Greater Manchester Police -2.444807 53.61151
## 6 Greater Manchester Police Greater Manchester Police -2.444043 53.62939
##                  Location LSOA.code                  LSOA.name
## 1   On or near Scout Road E01012628 Blackburn with Darwen 018D
## 2   On or near Scout Road E01012628 Blackburn with Darwen 018D
## 3   On or near Scout Road E01012628 Blackburn with Darwen 018D
## 4 On or near Parking Area E01004768                Bolton 001A
## 5 On or near Belmont Road E01004768                Bolton 001A
## 6    On or near West Walk E01004803                Bolton 001B
##                     Crime.type
## 1        Anti-social behaviour
## 2    Criminal damage and arson
## 3    Criminal damage and arson
## 4        Anti-social behaviour
## 5 Violence and sexual offences
## 6                  Other theft
##                           Last.outcome.category Context
## 1                                                    NA
## 2 Investigation complete; no suspect identified      NA
## 3                   Unable to prosecute suspect      NA
## 4                                                    NA
## 5                           Under investigation      NA
## 6 Investigation complete; no suspect identified      NA

Or you can view, with the View() function.

Now, we have our points that are crimes, right? Well… How do we connect them to our points that are licensed premises?

One approach is to build a buffer around our licensed premises, and say that we will count all the crimes which fall within a specific radius of this licensed premise. What should this radius be? Well this is where your domain knowledge as criminologist comes in. How far away would you consdier a crime to still be related to this pub? 400 meters? 500 meters? 900 meters? 1 km? What do you think? This is again one of them it depends questions. Whatever buffer you choose you should justify, and make sure that you can defend when someone might ask about it, as the further your reach obviously the more crimes you will include, and these might alter your results.

So, let’s say we are interested in all crimes that occur within 400 meters of each licensed premise. We chose 400m here as this is the recommended distance for accessible bus stop guidance, so basically as far as people should walk to get to a bus stop TfL, 2008. So in this case, we want to take our points, which represent the licensed premises, and build buffers of 400 meters around them.

You can do with the st_buffer() function:

prem_buffer <- st_buffer(cc_intersects, 1)
## Warning in st_buffer.sfc(st_geometry(x), dist, nQuadSegs): st_buffer does
## not correctly buffer longitude/latitude data, dist needs to be in decimal
## degrees.

You should get a warning here, like I did above. This message indicates that sf assumes a distance value is given in degrees. This is because we have lat/long data (WSG 48)

One quick fix to avoid this message, is to convert to BNG:

prem_BNG <- st_transform(cc_intersects, 27700)

Now we can try again, with meters

prem_buffer <- st_buffer(prem_BNG, 400)

Let’s see how that looks:

plot(st_geometry(prem_buffer))
plot(st_geometry(prem_BNG), add = T)

That should look nice and squiggly. But also it looks like there is quite a lot of overlap here. Should we maybe consider smaller buffers? Let’s look at 100 meter buffers:

prem_buffer_100 <- st_buffer(prem_BNG, 100)
plot(st_geometry(prem_buffer_100))
plot(st_geometry(prem_BNG), add = T)

Still quite a bit of overlap, but this is possibly down to all the licensed premises being very densely close together in the city centre.

Well now let’s have a look at our crimes. I think it might make sense (again using domain knowledge) to restrict to violent crime. So let’s do this:

violent_spatial <- crimes_spatial %>%
  filter(Crime.type=="Violence and sexual offences")

Now, remember the CRS is WGS 48 here, so we will need to convert our buffer layer back to this:

buffer_WGS84 <- st_transform(prem_buffer_100, 4326)

Now let’s just have a look:

plot(st_geometry(buffer_WGS84))
plot(st_geometry(violent_spatial), add = T)

OKAY, so some crimes fall inside some buffers, others not so much. Well, let’s get to our last spatial operation of the day, the famous points in polygon, to get to answering which licensed premises have the most violent crimes near them.

Points in Polygon

When you have a polygon layer and a point layer - and want to know how many or which of the points fall within the bounds of each polygon, you can use this method of analysis. In computational geometry, the point-in-polygon (PIP) problem asks whether a given point in the plane lies inside, outside, or on the boundary of a polygon. As you can see, this is quite relevant to our problem, wanting to count how many crimes (points) fall within 100 meters of our licensed premises (our buffer polygons).

crimes_per_prem <- violent_spatial %>% 
  st_join(buffer_WGS84, ., left = FALSE) %>% 
  count(PREMISESNAME)
## although coordinates are longitude/latitude, it is assumed that they are planar

You now have a new dataframe, crimes_per_prem which has a column for the name of the premises, a column for the number of violend crimes that fall within the buffer, and a column for the geometry.

Take a moment to look at this table. Use the View() function. Which premises have the most violent crimes? Are you surprised?

Now as a final step, let’s plot this, going back to leaflet:

pal <- colorBin("RdPu", domain = crimes_per_prem$n, bins = 5, pretty = TRUE)
leaflet(crimes_per_prem) %>% 
  addTiles() %>% 
  addPolygons(fillColor = ~pal(n), fillOpacity = 0.8,
              weight = 1, opacity = 1, color = "black",
              label = ~as.character(PREMISESNAME)) %>% 
  addLegend(pal = pal, values = ~n, opacity = 0.7, 
            title = 'Violend crimes', position = "bottomleft") 

It’s not the neatest of maps, with all these overlaps, but we can talk about prettifying maps another day. You’ve done enough today.

Recap

Today we learned to:

Homework

In the homework you will now apply your learning to a new problem. To do so, please complete the following tasks:

Geocode the addresses:

Use the geocode function to geocode, but this time pass the actual address values, rather than just the post code.

You can geocode from addresses. For example, if we want to geocode the location of Shoryu ramen, we could use the address to do so:

geocode("1 Piccadilly, Manchester M1 1RG", source = "dsk")
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=1%20Piccadilly,%20Manchester%20M1%201RG&sensor=false
##         lon      lat
## 1 -2.235015 53.48068

You can see that this is slightly different than the postcode centroid:

geocode("M1 1RG", source = 'dsk')
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=M1%201RG&sensor=false
##         lon      lat
## 1 -2.235001 53.48007
  • 1) Discuss why you think the geocode for the address is different than the geocode for the postcode
  • 2) Geocode the address for all the city centre licensed premises
  • 3) Did this work OK? Do you have any geocoded to weird locations? Why do you think this might be
  • 4) Fix any issues with geocoding that you identified above

Make sure you think a little bit about this. Not only do you have to consider what variable to pass to the geocode() function, but also whether that’s enough information to get the relevant coordinates.